Unsupervised record matching with noisy and incomplete data

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised record matching with noisy and incomplete data

We consider the problem of duplicate detection: given a large data set in which each entry has multiple attributes, detect which distinct entries refer to the same real world entity. Our method consists of three main steps: creating a similarity score between entries, grouping entries together into ‘unique entities’, and refining the groups. We compare various methods for creating similarity sc...

متن کامل

Hyperspectral Unmixing from Incomplete and Noisy Data

In hyperspectral images, once the pure spectra of the materials are known, hyperspectral unmixing seeks to find their relative abundances throughout the scene. We present a novel variational model for hyperspectral unmixing from incomplete noisy data, which combines a spatial regularity prior with the knowledge of the pure spectra. The material abundances are found by minimizing the resulting c...

متن کامل

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

Unsupervised Blocking of Imbalanced Datasets for Record Matching

Record matching in data engineering refers to searching for data records originating from same entities across different data sources. The solutions for record matching usually employ learning algorithms to train a classifier that labels record pairs as either matches or nonmatches. In practice, the amount of non-matches typically far exceeds the amount of matches. This problem is so-called imb...

متن کامل

PSOM+: Parametrized Self-Organizing Maps for noisy and incomplete data

We present an extension to the Parametrized Self-Organizing Map that allows the construction of continuous manifolds from noisy, incomplete and not necessarily gridorganized training data. All three problems are tackled by minimizing the overall smoothness of a PSOM manifold. For this, we introduce a matrix which defines a metric in the space of PSOM weights, depending only on the underlying gr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Data Science and Analytics

سال: 2018

ISSN: 2364-415X,2364-4168

DOI: 10.1007/s41060-018-0129-7